Data Replication Assignment

Introduction

This dataset comes from Winchester et al.’s (2014) article, “Dental Topography of Platyrrhines and Prosimians: Convergence and Contrasts.” The main question of the article asks if dietary preference can be figured out based on tooth measurements and, if so, are platyrrhines showing enough topographical variation to apply these models to extinct taxa? After running a mix of ANOVA, PGLS, and DFA, main results seem to conclude that platyrrhines are broadly similar to the prosimians. However, using dietary categories to assign groups can still be accurately assigned 82% of the time with a prosimian-only sample and 94% of the time with a platyrrhine only sample, with a combined percentage of 77% when testing “unknown” individuals. For clarity, here is a list of the main topographic measurements done in this study on platyrrhine samples:

Variables measured: Five variables on M2

  1. Shearing Ratio (SR)- measure of length of molar shearing crest relative to body size
  2. Shearing Quotient (SQ)
  3. Dirichlet normal energy (DNE)- measure of total curvature across a tooth surface
  4. Relief Index (RFI)- Ratio of the three-dimensional true surface area of a tooth to the area occupied by the tooth surface’s projection into a 2-D dimensional occlusal plane
  5. Orientation patch count-rotated (OPCR)- the number of discrete contiguous areas of similar aspect

For my replication, I chose to focus on testing each variable for normality, using boxplots to illustrate relationships between each variable when separated by prosimians and platyrrhines (Figure 2), the variability between each topographic measurement using ANOVA and linear models (Table 3), and using a linear discriminant analysis to test each topographic variable against diet based on group (Figure 3 and Figure 4).

Load Packages and Data

First, we will read in the dataset and check for normality.

library(ggplot2)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(MASS)
## 
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
## 
##     select
# Load in specimen name/variable dataset and name it "SpecData"
SpecData <- read.csv("specvariable_repdata.csv")

# Quick look at the dataset
head(SpecData)
##      Specimen       Group                         Species M2.Length   RFI
## 1 USNM 171063 Platyrrhini Alouatta palliata aequatorialis      7.34 0.569
## 2 USNM 284782 Platyrrhini Alouatta palliata aequatorialis      8.03 0.594
## 3 USNM 290601 Platyrrhini Alouatta palliata aequatorialis      7.83 0.547
## 4  SB N Al 13 Platyrrhini              Alouatta seniculus      7.91 0.572
## 5 USNM 123517 Platyrrhini              Alouatta seniculus      7.75 0.536
## 6 USNM 281658 Platyrrhini              Alouatta seniculus      7.73 0.557
##       DNE   OPCR Shearing.ratio Shearing.quotient     Diet
## 1 167.027 52.250          3.050            -0.174 Folivory
## 2 207.057 53.625          3.362             2.719 Folivory
## 3 178.497 53.250          3.002            -0.676 Folivory
## 4 170.393 51.625          3.092            -0.830 Folivory
## 5 172.125 54.000          2.986            -1.486 Folivory
## 6 170.079 54.625          3.030            -0.929 Folivory
#Quick summary of dataset values
summary(SpecData)
##         Specimen           Group                        Species   
##  AMNH 100503:  2   Platyrrhini:111   Brachyteles arachnoides: 10  
##  AMNH 100632:  2   Prosimii   :121   Aotus nigriceps        :  8  
##  AMNH 100640:  2                     Alouatta seniculus     :  7  
##  AMNH 100829:  2                     Avahi laniger          :  7  
##  AMNH 100832:  2                     Microcebus griseorufus :  7  
##  AMNH 170461:  2                     Saimiri boliviensis    :  7  
##  (Other)    :220                     (Other)                :186  
##    M2.Length          RFI              DNE              OPCR      
##  Min.   :1.500   Min.   :0.3250   Min.   : 70.44   Min.   :30.12  
##  1st Qu.:3.350   1st Qu.:0.4878   1st Qu.:133.43   1st Qu.:47.09  
##  Median :4.100   Median :0.5135   Median :162.25   Median :52.88  
##  Mean   :4.548   Mean   :0.5122   Mean   :171.63   Mean   :54.75  
##  3rd Qu.:5.702   3rd Qu.:0.5533   3rd Qu.:203.84   3rd Qu.:60.09  
##  Max.   :8.720   Max.   :0.6730   Max.   :354.60   Max.   :94.88  
##                                                                   
##  Shearing.ratio  Shearing.quotient                  Diet   
##  Min.   :2.248   Min.   :-8.0660   Folivory           :50  
##  1st Qu.:2.659   1st Qu.:-1.8540   Frugivory          :62  
##  Median :2.971   Median : 0.6645   Hard-Object Feeding:47  
##  Mean   :2.994   Mean   : 2.5183   Insectivory        :25  
##  3rd Qu.:3.301   3rd Qu.: 5.4460   Omnivory           :48  
##  Max.   :4.167   Max.   :25.9420                           
##  NA's   :6       NA's   :6

Normality Check

Now I will use qqplots to check for normality in each of the topographic variables.

# For M2 Length
qqnorm(SpecData$M2.Length, main="M2.Length Normality Check")
qqline(SpecData$M2.Length, col="black")

# For OPCR
qqnorm(SpecData$OPCR, main="OPCR Normality Check")
qqline(SpecData$OPCR, col="black")

qqnorm(SpecData$Shearing.ratio, main="Shearing Ratio Normality Check")
qqline(SpecData$Shearing.ratio, col="black")

qqnorm(SpecData$Shearing.quotient, main="Shearing Quotient Normality Check")
qqline(SpecData$Shearing.quotient, col="black")

qqnorm(SpecData$DNE, main="DNE Normality Check")
qqline(SpecData$DNE, col="black")

RFI <- qqnorm(SpecData$RFI, main="RFI Normality Check")
qqline(SpecData$RFI, col="black")

As can be seen, the plots seem to skew off the best fit line at each end. This is likely due to some extremes in each topographic variable that skews out the expected distribution.

Dental Topographic Variation

Now that we have the data loaded in, we want to recreate the first set of boxplots (Figure 2) based on dental topography measurements for 5 types of feeding behaviors measured in the Winchester et al. (2014) paper.

# Recreating Figure 2: Boxplots of each Dental topography to the type of diet
library(forcats)
OPCR <- SpecData %>%
  mutate(Diet = fct_relevel(Diet, 
            "Insectivory", "Folivory", "Omnivory", 
            "Frugivory", "Hard-Object Feeding")) %>%
  ggplot( aes(x=Diet, y=OPCR)) +
    geom_boxplot(aes(fill=Group)) +
    xlab("Diet")

SR <- SpecData %>%
  mutate(Diet = fct_relevel(Diet, 
            "Insectivory", "Folivory", "Omnivory", 
            "Frugivory", "Hard-Object Feeding")) %>%
  ggplot( aes(x=Diet, y=Shearing.ratio)) +
    geom_boxplot(aes(fill=Group)) +
    xlab("Diet")

SQ <- SpecData %>%
  mutate(Diet = fct_relevel(Diet, 
            "Insectivory", "Folivory", "Omnivory", 
            "Frugivory", "Hard-Object Feeding")) %>%
  ggplot( aes(x=Diet, y=Shearing.quotient)) +
    geom_boxplot(aes(fill=Group)) +
    xlab("Diet")

DNE <- SpecData %>%
  mutate(Diet = fct_relevel(Diet, 
            "Insectivory", "Folivory", "Omnivory", 
            "Frugivory", "Hard-Object Feeding")) %>%
  ggplot( aes(x=Diet, y=DNE)) +
    geom_boxplot(aes(fill=Group)) +
    xlab("Diet")

RFI <- SpecData %>%
  mutate(Diet = fct_relevel(Diet, 
            "Insectivory", "Folivory", "Omnivory", 
            "Frugivory", "Hard-Object Feeding")) %>%
  ggplot( aes(x=Diet, y=RFI)) +
    geom_boxplot(aes(fill=Group)) +
    xlab("Diet")

#Call each to check on the data, note that 6 rows are removed in the shearing.ratio and shearing.quotient due to lack of data for Daubentonidae
OPCR

SR 
## Warning: Removed 6 rows containing non-finite values (stat_boxplot).

SQ 
## Warning: Removed 6 rows containing non-finite values (stat_boxplot).

DNE

RFI

Winchester et al. (2014) Figure 2

Here is Figure 2 from the Winchester Paper. My replicated versions match the original versions, however, mine lack some outlier points opposed to the original boxplots.

ANOVA and Linear Modeling

We will next include two-way ANOVAs for the platyrrhine sample by dietary group factor. But first, we should look at quick plots to see how our data pans out.

#Checking the Class of Group and Diet
class(SpecData$Group)
## [1] "factor"
class(SpecData$Diet)
## [1] "factor"
# Making a quick plot to look at the differences in dietary type
plot (SpecData$Diet ~ SpecData$Group)

Now that we can visualize the data, let’s make our first linear model to test if there are significant differences between each based on Group, Diet, and Group x Diet. This shows the variation in each topographic factor within each Group that is explained by Diet. The analysis here replicates Table 3 from the Winchester paper. I will also create a linear model equation for each in order to verify the p-values stated in the Winchester et al. paper.

# Running 2-Way ANOVA for DNE
DNEaov <- aov(data = SpecData, DNE~ Group * Diet)
summary(DNEaov)
##              Df Sum Sq Mean Sq F value   Pr(>F)    
## Group         1 145009  145009   230.1  < 2e-16 ***
## Diet          4 348229   87057   138.2  < 2e-16 ***
## Group:Diet    3  28923    9641    15.3 4.36e-09 ***
## Residuals   223 140516     630                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  # Quick Plot of DNE ANOVA
  par(mfrow = c(2, 2))
  plot(DNEaov)

# Running linear model for DNE to retrieve p-value
DNElm <- lm(data = SpecData, DNE~ Group * Diet)
summary(DNElm)
## 
## Call:
## lm(formula = DNE ~ Group * Diet, data = SpecData)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -63.671 -12.837  -1.249  14.914  83.322 
## 
## Coefficients: (1 not defined because of singularities)
##                                       Estimate Std. Error t value Pr(>|t|)
## (Intercept)                            174.618      5.613  31.110  < 2e-16
## GroupProsimii                           38.932      7.246   5.373 1.95e-07
## DietFrugivory                          -25.770      6.874  -3.749 0.000227
## DietHard-Object Feeding                -55.961      6.846  -8.174 2.24e-14
## DietInsectivory                         57.732      6.798   8.493 2.86e-15
## DietOmnivory                             9.657      9.722   0.993 0.321644
## GroupProsimii:DietFrugivory            -48.803      9.844  -4.958 1.41e-06
## GroupProsimii:DietHard-Object Feeding  -81.699     13.149  -6.213 2.52e-09
## GroupProsimii:DietInsectivory               NA         NA      NA       NA
## GroupProsimii:DietOmnivory             -39.952     11.494  -3.476 0.000612
##                                          
## (Intercept)                           ***
## GroupProsimii                         ***
## DietFrugivory                         ***
## DietHard-Object Feeding               ***
## DietInsectivory                       ***
## DietOmnivory                             
## GroupProsimii:DietFrugivory           ***
## GroupProsimii:DietHard-Object Feeding ***
## GroupProsimii:DietInsectivory            
## GroupProsimii:DietOmnivory            ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 25.1 on 223 degrees of freedom
## Multiple R-squared:  0.788,  Adjusted R-squared:  0.7804 
## F-statistic: 103.6 on 8 and 223 DF,  p-value: < 2.2e-16

Now that we have one done, we will do the same for RFI, OPCR, SR, and SQ. You will notice that the F values vary, but the p-values stay consistent to the table.

# Running 2-Way ANOVA for RFI
RFIaov <- aov(data = SpecData, RFI~ Group * Diet)
summary(RFIaov)
##              Df Sum Sq Mean Sq F value   Pr(>F)    
## Group         1 0.0516 0.05161   56.75 1.23e-12 ***
## Diet          4 0.5316 0.13289  146.11  < 2e-16 ***
## Group:Diet    3 0.0471 0.01571   17.27 4.03e-10 ***
## Residuals   223 0.2028 0.00091                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  # Quick Plot of DNE ANOVA
  par(mfrow = c(2, 2))
  plot(RFIaov)

  # Running linear model for RFI to retrieve p-value
  RFIlm <- lm(data = SpecData, RFI~ Group * Diet)
  summary(RFIlm)
## 
## Call:
## lm(formula = RFI ~ Group * Diet, data = SpecData)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.086227 -0.016993 -0.000298  0.016050  0.091880 
## 
## Coefficients: (1 not defined because of singularities)
##                                        Estimate Std. Error t value
## (Intercept)                            0.580100   0.006744  86.023
## GroupProsimii                         -0.050267   0.008706  -5.774
## DietFrugivory                         -0.051150   0.008259  -6.193
## DietHard-Object Feeding               -0.083734   0.008226 -10.180
## DietInsectivory                        0.051287   0.008167   6.280
## DietOmnivory                          -0.032800   0.011680  -2.808
## GroupProsimii:DietFrugivory           -0.067456   0.011827  -5.704
## GroupProsimii:DietHard-Object Feeding -0.082266   0.015798  -5.208
## GroupProsimii:DietInsectivory                NA         NA      NA
## GroupProsimii:DietOmnivory            -0.007665   0.013809  -0.555
##                                       Pr(>|t|)    
## (Intercept)                            < 2e-16 ***
## GroupProsimii                         2.58e-08 ***
## DietFrugivory                         2.81e-09 ***
## DietHard-Object Feeding                < 2e-16 ***
## DietInsectivory                       1.75e-09 ***
## DietOmnivory                           0.00542 ** 
## GroupProsimii:DietFrugivory           3.70e-08 ***
## GroupProsimii:DietHard-Object Feeding 4.34e-07 ***
## GroupProsimii:DietInsectivory               NA    
## GroupProsimii:DietOmnivory             0.57940    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03016 on 223 degrees of freedom
## Multiple R-squared:  0.7566, Adjusted R-squared:  0.7478 
## F-statistic: 86.62 on 8 and 223 DF,  p-value: < 2.2e-16
# Running 2-Way ANOVA for OPCR
OPCRaov <- aov(data = SpecData, OPCR~ Group * Diet)
summary(OPCRaov)
##              Df Sum Sq Mean Sq F value   Pr(>F)    
## Group         1  13848   13848  175.24  < 2e-16 ***
## Diet          4   3730     933   11.80 1.03e-08 ***
## Group:Diet    3   2499     833   10.54 1.64e-06 ***
## Residuals   223  17622      79                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  # Quick Plot of DNE ANOVA
  par(mfrow = c(2, 2))
  plot(OPCRaov)

  # Running linear model for DNE to retrieve p-value
  OPCRlm <- lm(data = SpecData, OPCR~ Group * Diet)
  summary(OPCRlm)
## 
## Call:
## lm(formula = OPCR ~ Group * Diet, data = SpecData)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -27.668  -5.475  -0.207   5.177  34.369 
## 
## Coefficients: (1 not defined because of singularities)
##                                       Estimate Std. Error t value Pr(>|t|)
## (Intercept)                             53.375      1.988  26.852  < 2e-16
## GroupProsimii                           -2.869      2.566  -1.118  0.26474
## DietFrugivory                            8.900      2.434   3.656  0.00032
## DietHard-Object Feeding                 16.293      2.425   6.720 1.50e-10
## DietInsectivory                          2.365      2.407   0.982  0.32692
## DietOmnivory                             2.400      3.443   0.697  0.48647
## GroupProsimii:DietFrugivory            -17.363      3.486  -4.981 1.27e-06
## GroupProsimii:DietHard-Object Feeding  -20.049      4.657  -4.305 2.50e-05
## GroupProsimii:DietInsectivory               NA         NA      NA       NA
## GroupProsimii:DietOmnivory              -8.511      4.070  -2.091  0.03766
##                                          
## (Intercept)                           ***
## GroupProsimii                            
## DietFrugivory                         ***
## DietHard-Object Feeding               ***
## DietInsectivory                          
## DietOmnivory                             
## GroupProsimii:DietFrugivory           ***
## GroupProsimii:DietHard-Object Feeding ***
## GroupProsimii:DietInsectivory            
## GroupProsimii:DietOmnivory            *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.89 on 223 degrees of freedom
## Multiple R-squared:  0.5326, Adjusted R-squared:  0.5158 
## F-statistic: 31.76 on 8 and 223 DF,  p-value: < 2.2e-16
# Running 2-Way ANOVA for SR
SRaov <- aov(data = SpecData, Shearing.ratio~ Group * Diet)
summary(SRaov)
##              Df Sum Sq Mean Sq F value  Pr(>F)    
## Group         1 13.194  13.194 217.058 < 2e-16 ***
## Diet          4 17.744   4.436  72.980 < 2e-16 ***
## Group:Diet    2  0.572   0.286   4.705 0.00999 ** 
## Residuals   218 13.251   0.061                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 6 observations deleted due to missingness
  # Quick Plot of SR ANOVA
  par(mfrow = c(2, 2))
  plot(SRaov)

  # Running linear model for SR to retrieve p-value
  SRlm <- lm(data = SpecData, Shearing.ratio~ Group * Diet)
  summary(SRlm)
## 
## Call:
## lm(formula = Shearing.ratio ~ Group * Diet, data = SpecData)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.58768 -0.14935 -0.03402  0.12855  1.41893 
## 
## Coefficients: (2 not defined because of singularities)
##                                       Estimate Std. Error t value Pr(>|t|)
## (Intercept)                            3.20250    0.05513  58.091  < 2e-16
## GroupProsimii                          0.32123    0.07117   4.513 1.04e-05
## DietFrugivory                         -0.38500    0.06752  -5.702 3.82e-08
## DietHard-Object Feeding               -0.74543    0.06724 -11.085  < 2e-16
## DietInsectivory                        0.04071    0.06677   0.610  0.54270
## DietOmnivory                          -0.44940    0.09549  -4.706 4.48e-06
## GroupProsimii:DietFrugivory           -0.28605    0.09669  -2.959  0.00343
## GroupProsimii:DietHard-Object Feeding       NA         NA      NA       NA
## GroupProsimii:DietInsectivory               NA         NA      NA       NA
## GroupProsimii:DietOmnivory            -0.07402    0.11289  -0.656  0.51272
##                                          
## (Intercept)                           ***
## GroupProsimii                         ***
## DietFrugivory                         ***
## DietHard-Object Feeding               ***
## DietInsectivory                          
## DietOmnivory                          ***
## GroupProsimii:DietFrugivory           ** 
## GroupProsimii:DietHard-Object Feeding    
## GroupProsimii:DietInsectivory            
## GroupProsimii:DietOmnivory               
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2465 on 218 degrees of freedom
##   (6 observations deleted due to missingness)
## Multiple R-squared:  0.704,  Adjusted R-squared:  0.6945 
## F-statistic: 74.06 on 7 and 218 DF,  p-value: < 2.2e-16
# Running 2-Way ANOVA for SR
SQaov <- aov(data = SpecData, Shearing.quotient~ Group * Diet)
summary(SQaov)
##              Df Sum Sq Mean Sq F value   Pr(>F)    
## Group         1   3458    3458 176.858  < 2e-16 ***
## Diet          4   3772     943  48.234  < 2e-16 ***
## Group:Diet    2    279     140   7.147 0.000985 ***
## Residuals   218   4262      20                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 6 observations deleted due to missingness
  # Quick Plot of SR ANOVA
  par(mfrow = c(2, 2))
  plot(SQaov)

  # Running linear model for SR to retrieve p-value
  SQlm <- lm(data = SpecData, Shearing.quotient~ Group * Diet)
  summary(SQlm)
## 
## Call:
## lm(formula = Shearing.quotient ~ Group * Diet, data = SpecData)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -14.2992  -2.1801  -0.3344   1.9713  20.9385 
## 
## Coefficients: (2 not defined because of singularities)
##                                       Estimate Std. Error t value Pr(>|t|)
## (Intercept)                           -0.01920    0.98871  -0.019 0.984524
## GroupProsimii                          6.56610    1.27642   5.144 5.98e-07
## DietFrugivory                          0.03303    1.21091   0.027 0.978267
## DietHard-Object Feeding               -4.08331    1.20598  -3.386 0.000841
## DietInsectivory                        9.59126    1.19738   8.010 6.83e-14
## DietOmnivory                           0.58320    1.71249   0.341 0.733765
## GroupProsimii:DietFrugivory           -6.55465    1.73398  -3.780 0.000202
## GroupProsimii:DietHard-Object Feeding       NA         NA      NA       NA
## GroupProsimii:DietInsectivory               NA         NA      NA       NA
## GroupProsimii:DietOmnivory            -3.67963    2.02455  -1.818 0.070513
##                                          
## (Intercept)                              
## GroupProsimii                         ***
## DietFrugivory                            
## DietHard-Object Feeding               ***
## DietInsectivory                       ***
## DietOmnivory                             
## GroupProsimii:DietFrugivory           ***
## GroupProsimii:DietHard-Object Feeding    
## GroupProsimii:DietInsectivory            
## GroupProsimii:DietOmnivory            .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.422 on 218 degrees of freedom
##   (6 observations deleted due to missingness)
## Multiple R-squared:  0.6379, Adjusted R-squared:  0.6263 
## F-statistic: 54.87 on 7 and 218 DF,  p-value: < 2.2e-16

Now that we have compared all of the df, F, and p-values to Table 3. We can see that there is a trend toward a larger F value than the ones recorded in Table 3. There is generally a trend of Group and Diet having larger values than the (Group * Diet) F-values.

LDA Analysis

Now we will try to run the first discriminant function analysis of first just prosimians.

library(dplyr)
library(MASS)

## Building a Model for Group based on Diet
model <- lda(Diet~DNE+RFI+OPCR+M2.Length, data = SpecData %>% filter(Group=='Prosimii'))
model
## Call:
## lda(Diet ~ DNE + RFI + OPCR + M2.Length, data = SpecData %>% 
##     filter(Group == "Prosimii"))
## 
## Prior probabilities of groups:
##            Folivory           Frugivory Hard-Object Feeding 
##          0.24793388          0.18181818          0.04958678 
##         Insectivory            Omnivory 
##          0.20661157          0.31404959 
## 
## Group means:
##                           DNE       RFI     OPCR M2.Length
## Folivory            213.55003 0.5298333 50.50583  5.486667
## Frugivory           138.97691 0.4112273 42.04318  4.818182
## Hard-Object Feeding  75.89017 0.3638333 46.75000  4.416667
## Insectivory         271.28220 0.5811200 52.87100  2.836000
## Omnivory            183.25521 0.4893684 44.39474  3.868421
## 
## Coefficients of linear discriminants:
##                    LD1         LD2          LD3         LD4
## DNE        -0.01589503 -0.01071809   0.02255529 -0.02297857
## RFI       -18.23048808  6.78089272 -12.03709733 22.84098987
## OPCR       -0.02329067 -0.06258873  -0.09283985 -0.07535709
## M2.Length   0.15158064 -0.74713694   0.16843943  0.01343694
## 
## Proportion of trace:
##    LD1    LD2    LD3    LD4 
## 0.8834 0.0888 0.0150 0.0129
predictions <- model %>% predict(SpecData)
names(predictions)
## [1] "class"     "posterior" "x"
# Predicted classes
head(predictions$class, 6)
## [1] Folivory Folivory Folivory Folivory Folivory Folivory
## Levels: Folivory Frugivory Hard-Object Feeding Insectivory Omnivory
# Predicted probabilities of class memebership.
head(predictions$posterior, 6) 
##    Folivory    Frugivory Hard-Object Feeding  Insectivory    Omnivory
## 1 0.9159127 3.057234e-04        2.061726e-07 0.0012453890 0.082536026
## 2 0.9888923 6.152336e-06        2.414418e-10 0.0043960416 0.006705521
## 3 0.9661582 6.706555e-04        2.750846e-07 0.0003241260 0.032846774
## 4 0.9537723 2.192882e-04        9.716749e-08 0.0006941445 0.045314188
## 5 0.9554777 1.739290e-03        1.288420e-06 0.0001830356 0.042598715
## 6 0.9609591 4.103401e-04        2.802489e-07 0.0004459847 0.038184319
# Linear discriminants
head(predictions$x, 3)
##          LD1       LD2        LD3      LD4
## 1 -0.4885521 -1.822708 -1.4349762 1.951717
## 2 -1.5080264 -2.683815 -0.8444470 1.508565
## 3 -0.2188135 -2.523510 -0.9217554 1.116878
# Make graph
lda.data <- cbind(SpecData %>% filter(Group=='Prosimii'), predict(model)$x)
ggplot(lda.data, aes(-LD1,-LD2), inherit.aes = FALSE) +
  geom_point(aes(color = Diet))

Next, we will do one for only Platyrrhines. It is important to note that the original paper does not include a platyrrhine-only DFA.

library(dplyr)
library(MASS)

## Building a Model for Group based on Diet
model2 <- lda(Diet~DNE+RFI+OPCR+M2.Length, data = SpecData %>% filter(Group=='Platyrrhini'))
## Warning in lda.default(x, grouping, ...): group Insectivory is empty
model2
## Call:
## lda(Diet ~ DNE + RFI + OPCR + M2.Length, data = SpecData %>% 
##     filter(Group == "Platyrrhini"))
## 
## Prior probabilities of groups:
##            Folivory           Frugivory Hard-Object Feeding 
##          0.18018018          0.36036036          0.36936937 
##            Omnivory 
##          0.09009009 
## 
## Group means:
##                          DNE       RFI     OPCR M2.Length
## Folivory            174.6179 0.5801000 53.37500  7.838500
## Frugivory           148.8478 0.5289500 62.27500  4.598450
## Hard-Object Feeding 118.6573 0.4963659 69.66768  4.186902
## Omnivory            184.2747 0.5473000 55.77500  2.780000
## 
## Coefficients of linear discriminants:
##                   LD1         LD2           LD3
## DNE        0.04122506  0.03510623   0.035299153
## RFI       26.96641606  2.99763767 -39.128427049
## OPCR      -0.04206473 -0.05342556  -0.009063623
## M2.Length  0.77659863 -0.93603879   0.315724242
## 
## Proportion of trace:
##    LD1    LD2    LD3 
## 0.8195 0.1805 0.0000
predictions <- model2 %>% predict(SpecData)
names(predictions)
## [1] "class"     "posterior" "x"
# Predicted classes
head(predictions$class, 6)
## [1] Folivory Folivory Folivory Folivory Folivory Folivory
## Levels: Folivory Frugivory Hard-Object Feeding Insectivory Omnivory
# Predicted probabilities of class memebership.
head(predictions$posterior, 6) 
##    Folivory    Frugivory Hard-Object Feeding     Omnivory
## 1 0.9999020 9.796597e-05        2.921329e-11 3.261446e-08
## 2 1.0000000 9.851119e-11        5.420283e-21 4.176638e-12
## 3 0.9999768 2.315618e-05        4.729675e-12 5.247475e-09
## 4 0.9999986 1.421014e-06        9.854092e-14 2.388784e-10
## 5 0.9994613 5.386199e-04        8.626135e-10 3.417559e-08
## 6 0.9999497 5.030303e-05        2.436862e-11 4.159582e-09
# Linear discriminants
head(predictions$x, 3)
##        LD1        LD2        LD3
## 1 4.363082 -0.8726762 0.02280032
## 2 7.165496 -0.1117598 0.66300197
## 3 4.581141 -1.0480404 1.43414825
# Make graph
lda.data2 <- cbind(SpecData %>% filter(Group=='Platyrrhini'), predict(model2)$x)
ggplot(lda.data2, aes(-LD1,-LD2), inherit.aes = FALSE) +
  geom_point(aes(color = Diet))

Finally, we will replicate Figure 4 from the paper to show a discriminant function analysis of topographic metrics and molar area for platyrrhines and prosimians.

library(dplyr)
library(MASS)

## Building a Model for Group based on Diet
model3 <- lda(Diet~DNE+RFI+OPCR+M2.Length, data = SpecData)
model3
## Call:
## lda(Diet ~ DNE + RFI + OPCR + M2.Length, data = SpecData)
## 
## Prior probabilities of groups:
##            Folivory           Frugivory Hard-Object Feeding 
##           0.2155172           0.2672414           0.2025862 
##         Insectivory            Omnivory 
##           0.1077586           0.2068966 
## 
## Group means:
##                          DNE       RFI     OPCR M2.Length
## Folivory            197.9772 0.5499400 51.65350  6.427400
## Frugivory           145.3453 0.4871774 55.09597  4.676419
## Hard-Object Feeding 113.1977 0.4794468 66.74202  4.216234
## Insectivory         271.2822 0.5811200 52.87100  2.836000
## Omnivory            183.4676 0.5014375 46.76562  3.641667
## 
## Coefficients of linear discriminants:
##                   LD1          LD2          LD3         LD4
## DNE       -0.03249541 -0.003744819 -0.008181301 -0.02150385
## RFI       -7.37543201  1.574661968 -1.017144473 23.69184454
## OPCR       0.05103574 -0.017905603 -0.082974649 -0.04326357
## M2.Length -0.07044613 -0.804654184 -0.018973693 -0.23370875
## 
## Proportion of trace:
##    LD1    LD2    LD3    LD4 
## 0.8019 0.1596 0.0380 0.0005
predictions <- model3 %>% predict(SpecData)
names(predictions)
## [1] "class"     "posterior" "x"
# Predicted classes
head(predictions$class, 6)
## [1] Folivory Folivory Folivory Folivory Folivory Folivory
## Levels: Folivory Frugivory Hard-Object Feeding Insectivory Omnivory
# Predicted probabilities of class memebership.
head(predictions$posterior, 6) 
##    Folivory    Frugivory Hard-Object Feeding  Insectivory    Omnivory
## 1 0.9216942 0.0575717561        2.940098e-04 2.092461e-05 0.020419119
## 2 0.9982200 0.0005750234        2.556513e-07 1.487040e-04 0.001056058
## 3 0.9751393 0.0194941452        6.293818e-05 1.107185e-05 0.005292510
## 4 0.9741677 0.0192914560        6.011584e-05 1.058124e-05 0.006470191
## 5 0.9448672 0.0472935560        2.706440e-04 5.854642e-06 0.007562723
## 6 0.9501963 0.0416342125        2.415409e-04 7.620188e-06 0.007920284
# Linear discriminants
head(predictions$x, 3)
##          LD1       LD2         LD3         LD4
## 1 -0.5934099 -2.095032  0.13440156  0.90018043
## 2 -2.0570207 -2.785402 -0.34570652  0.41093108
## 3 -0.7873557 -2.584814 -0.02933254 -0.02547014
# Make graph
lda.data3 <- cbind(SpecData, predict(model3)$x)
ggplot(lda.data3, aes(-LD1,-LD2), inherit.aes = FALSE) +
  geom_point(aes(color = Diet, shape = Group))

Success! Now we have successfully replicated Figure 4. You’ll notice the main difference is that color is used in this graph to discern Diet whereas shapes are used to discern Group. I do not think this is more effective than the shape drawing but I do think that the addition of color helps quickly differentiate between Group and Diet.

Conclusion

After running these analyses, I have come to the overall conclusion that Winchester et al. (2014) have allowed for highly replicable research. Their datasets are complete and they explain many of their shortcomings throughout. I did notice that when using qqnorm() for each topograpic variable, there seemes to be a lot of non-normal skewing on the ends. This seems to be due to extreme outliers in the dataset.

When replicating Figure 2, I found that the graphs were almost exactly the same and found it fairly simple to replicate them. Using ANOVA and linear models to test the information in Table 3 was also a good test on earlier modules from the course. However, I noticed that the F statistics differed greatly in comparison to the original Table 3 values. I was unable to find a specific reason for this, but believe that it may have something to do with the different statistics programming software used by Winchester et al.

The most rewarding and also most difficult to replicate were Figures 3 and 4. Once Figure 3 was created, I had no problem creating a platyrrhine-only figure. This figure was not included in the original paper even though I think it would be a useful visualization of the paper’s sample. The creation of Figure 4 gave the most robust visualization of the dataset.

Overall, this was a positive experience that allowed me to dig into a dataset and test replicability.